Using Information Extraction to Classify Newspapers Advertisements

نویسندگان

  • Ramón Aragüés Peleato
  • Jean-Cédric Chappelier
  • Martin Rajman
چکیده

This paper presents a text classification procedure that has been developed in the context of an information extraction project. In the prototype that has been developed for this project, newspaper advertisements are processed by three main modules: first of all, a classification module associates a category to the advertisement. Then, a tagging module identifies textual information units that are related to the associated category, and finally a predefined form for that category is filled with the tagged text. The classification module, which is the main focus of this paper, consists in using a naive Bayes classifier and, at the same time, trying to fill all the predefined forms associated with all categories. Results of both methods (classification probabilities and filling scores) are then combined to provide a final classification decision. This mixed classification method is described and evaluated on the basis of concrete experiments carried out on real data. The purpose of the presented experiments is to precisely evaluate the impact of the information extraction step on classification accuracy. As one could reasonably expect, classification relying on information extraction alone doesn’t perform very well but when used as a complement to the statistical approach it significantly improves the classification results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

A Unified Framework for Information Extraction from Newspaper Images

Nowadays Newspapers are very common source of information which is easily available to all. It consists of all sorts of news like social news, political news and lots of advertisements. These advertisements/announcements are concentrated on some specific page. This paper proposes a system that can extract contact information like email address, website address and telephone number from newspape...

متن کامل

A Study of Information Extraction Tools for Online English Newspapers (PDF): Comparative Analysis

Information retrieval is the task of retrieving relevant and useful information from e-newspapers. Electronic newspapers are electronic replicas of traditional newspapers. E-newspapers are becoming increasingly popular because of the ease and convenience in accessing them. Newspapers are the source of timely information. These are the documents comprising news items and several independent info...

متن کامل

UV tanning advertisements in high school newspapers.

OBJECTIVE To examine the increasing use of UV tanning parlors by adolescents, despite the World Health Organization recommendation that no one under the age of 18 years use UV tanning devices. DESIGN We examined tanning advertisements in a sample of public high school newspapers published between 2001 and 2005 in 3 Colorado counties encompassing the Denver metropolitan area. RESULTS Tanning...

متن کامل

Promotion of waterpipe tobacco use, its variants and accessories in young adult newspapers: a content analysis of message portrayal.

The objective of our study was to identify waterpipe tobacco smoking advertisements and those that promoted a range of products and accessories used to smoke waterpipe tobacco. The content of these advertisements was analyzed to understand the messages portrayed about waterpipe tobacco smoking in young adult (aged 18-30) newspapers. The study methods include monitoring of six newspapers targeti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000